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The Metropolis-adjusted Langevin (MALA) algorithm is a sam- 
pling algorithm which makes local moves by incorporating information 
about the gradient of the logarithm of the target density. In this paper 
we study the efficiency of MALA on a natural class of target measures 
supported on an infinite dimensional Hilbert space. These natural mea- 
sures have density with respect to a Gaussian random field measure 
and arise in many applications such as Bayesian nonparametric statis- 
tics and the theory of conditioned diffusions. We prove that, started in 
stationarity, a suitably interpolated and scaled version of the Markov 
chain corresponding to MALA converges to an infinite dimensional dif- 
fusion process. Our results imply that, in stationarity, the MALA al- 
gorithm applied to an A^-dimensional approximation of the target will 
take 0{N^^^) steps to explore the invariant measure, comparing fa- 
vorably with the Random Walk Metropolis which was recently shown 
to require 0{N) steps when applied to the same class of problems. 
As a by-product of the diffusion limit, it also follows that the MALA 
algorithm is optimized at an average acceptance probability of 0.574. 
Previous results were proved only for targets which are products of 
one-dimensional distributions, or for variants of this situation, limiting 
their applicability. The correlation in our target means that the rescaled 
MALA algorithm converges weakly to an infinite dimensional Hilbert 
space valued diffusion, and the limit cannot be described through anal- 
ysis of scalar diffusions. The limit theorem is proved by showing that 
a drift-martingale decomposition of the Markov chain, suitably scaled, 
closely resembles a weak Euler-Maruyama discretization of the puta- 
tive limit. An invariance principle is proved for the martingale, and a 
continuous mapping argument is used to complete the proof. 
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1. Introduction. Sampling probability distributions vr^ in for N 
large is of interest in numerous applications arising in applied probability 
and statistics. The Markov chain Monte Carlo (MCMC) methodology [21] 
provides a framework for many algorithms which affect this sampling. It is 
hence of interest to quantify the computational cost of MCMC methods as a 
function of dimension N. This paper is part of a research program designed 
to develop the analysis of MCMC in high dimensions so that it may be use- 
fully applied to understand target measures which arise in applications. The 
simplest class of target measures for which analysis can be carried out are 
perhaps target distributions vr^ of the form 

(1-1) ^(^)=n/(^«)- 

i=l 

Here X^{dx) is the iV-dimensional Lebesgue measure, and /(x) is a one- 
dimensional probability density function. Thus ir^ has the form of an i.i.d. 
product. Using understanding gained in this situation, we will develop an 
analysis, that is, relevant to an important class of nonproduct measures 
which arise in a range of applications. 

We start by describing the MCMC methods which are studied in this pa- 
per. Consider a vr^-invariant metropolis Hastings-Markov chain {x^''^}k>i- 
From the current state x, we propose y drawn from the kernel q{x,y); this 
is then accepted with probability 

a{x,y) = 1 A -. 

7r^{x)q{x,y) 

Two widely used proposals are the random walk proposal (obtained from 
the discrete approximation of Brownian motion) 

(1.2) y = x + V26Z^, Z^~N(0,l7v), 

and the Langevin proposal (obtained from the time discretization of the 
Langevin diffusion) 

(1.3) y = x + 6VlogTT^{x) + V26Z^, Z^~N(0,l7v). 

Here 26 is the proposal variance, a parameter quantifying the size of the 
discrete time increment; we will consider "local proposals" for which 6 is 
small. The Markov chain corresponding to proposal (1.2) is the Random 
Walk Metropolis (RWM) algorithm [20], and the Markov transition rule 
constructed from the proposal (1.3) is known as the Metropolis Adjusted 
Langevin Algorithm (MALA) [21]. This paper is aimed at analyzing the 
computational complexity of the MALA algorithm in high dimensions. 

A fruitful way to quantify the computational cost of these Markov chains 
which proceed via local proposals is to determine the "optimal" size of in- 
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crement 5 as a function of dimension N (the precise notion of optimality 
is discussed below). A simple heuristic suggests the existence of such an 
"optimal scale" for 6: smaller values of the proposal variance lead to high 
acceptance rates, but the chain does not move much even when accepted, 
and therefore may not be efficient. Larger values of the proposal variance 
lead to larger moves, but then the acceptance probability is tiny. The opti- 
mal scale for the proposal variance strikes a balance between making large 
moves and still having a reasonable acceptance probability. In order to quan- 
tify this idea it is useful to define a continuous interpolant of the Markov 
chain as follows: 

(1.4) 

for kAt<t< {k + l)At. 

We choose the proposal variance to satisfy 5 = iAt, with At = N""' setting 
the scale in terms of dimension and the parameter i a "tuning" parameter 
which is independent of the dimension A^. Key questions, then, concern the 
choice of 7 and £. If converges weakly to a suitable stationary diffusion 
process, then it is natural to deduce that the number of Markov chain steps 
required in stationarity is inversely proportional to the proposal variance, 
and hence to At, and so grows like N'^ . The parametric dependence of 
the limiting diffusion process then provides a selection mechanism for £. 
A research program along these lines was initiated by Roberts and coworkers 
in the pair of papers [22, 23]. These papers concerned the RWM and MALA 
algorithms, respectively, when applied to the target (1.1). In both cases it 
was shown that the projection of into any single fixed coordinate direction 
Xi converges weakly in C([0,T];M) to z, the scalar diffusion process 

(1.5) ^ = hie)[logf{z)]' + ./2h(f)^ 

for h{i) > 0, a constant determined by the parameter i from the proposal 
variance. For RWM the scaling of the proposal variance to achieve this limit 
is determined by the choice 7 = 1 [22] , while for MALA 7 = 5 [23] . The anal- 
ysis shows that the number of steps required to sample the target measure 
grows as 0{N) for RWM, but only as 0{N^/'^) for MALA. This quantifies 
the efficiency gained by use of MALA over RWM, and in particular from 
employing local moves informed by the gradient of the logarithm of the tar- 
get density. A second important feature of the analysis is that it suggests 
that the optimal choice of £ is that which maximizes h(i). This value of £ 
leads, in both cases, to a universal [independent of /(•)] optimal average 
acceptance probability (to three significant figures) of 0.234 for RWM and 
0.574 for MALA. 
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These theoretical analyses have had a huge practical impact as the opti- 
mal acceptance probabilities send a concrete message to practitioners: one 
should "tune" the proposal variance of the RWM and MALA algorithms so 
as to have acceptance probabilities of 0.234 and 0.574, respectively. How- 
ever, practitioners use these tuning criteria far outside the class of target 
distributions given by (1.1). It is natural to ask whether they are wise to 
do so. Extensive simulations (see [24, 26]) show that these optimality re- 
sults also hold for more complex target distributions. Furthermore, a range 
of subsequent theoretical analyses confirmed that the optimal scaling ideas 
do indeed extend beyond (1.1); these papers studied slightly more compli- 
cated models, such as products of one-dimensional distributions with differ- 
ent variances and elliptically symmetric distributions [1, 2, 9, 11]. However, 
the diffusion limits obtained remain essentially one dimensional in all of 
these extensions.^ In this paper we study considerably more complex target 
distributions which are not of the product form, and the limiting diffusion 
takes values in an infinite dimensional space. 

Our perspective on these problems is motivated by applications such as 
Bayesian nonparametric statistics, for example, in application to inverse 
problems [27], and the theory of conditioned diffusions [15]. In both these 
areas the target measure of interest, vr, is on an infinite dimensional real 
separable Hilbert space 7i and, for Gaussian priors (inverse problems) or ad- 
ditive noise (diffusions) is absolutely continuous with respect to a Gaussian 
measure ttq on T-L with mean zero and covariance operator C. This frame- 
work for the analysis of MCMC in high dimensions was first studied in the 
papers [6-8] . The Radon-Nikodym derivative defining the target measure is 
assumed to have the form 

(1.6) ^(x) = M*exp(-^(2;)) 

for a real- valued functional ^ :T-L'^ i— t- R defined on a subspace Ti.^ C % that 
contains the support of the reference measure ttq; here Mij, is a normalizing 
constant. We are interested in studying MCMC methods applied to finite 
dimensional approximations of this measure found by projecting onto the 
first A'^ eigenfunctions of the covariance operator C of the Gaussian reference 
measure ttq. 

It is proved in [12, 16, 17] that the measure vr is invariant for 'H-valued 
SDEs (or stochastic PDEs-SPDEs) with the form 

(1.7) ^ = -h{i){z+cv^{z)) + Vm)^, m = z', 

where VF is a Brownian motion (see [12]) in T-L with covariance operator C. 
In [19] the RWM algorithm is studied when applied to a sequence of finite di- 



*Tlie paper [10] contains an infinite dimensional diffusion limit, but we have been unable 
to employ the techniques of that paper. 
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mensional approximations of vr as in (1.6). The continuous time interpolant 
of the Markov chain given by (1.4) is shown to converge weakly to z solv- 
ing (1.7) in C([0, T]; ?^*). Furthermore, as for the i.i.d. target measure, the 
scaling of the proposal variance which achieves this scaling limit is inversely 
proportional to N (i.e., corresponds to the exponent 7=1), and the speed of 
the limiting diffusion process is maximized at the same universal acceptance 
probability of 0.234 that was found in the i.i.d. case. Thus, remarkably, the 
i.i.d. case has been of fundamental importance in understanding MCMC 
methods applied to complex infinite dimensional probability measures aris- 
ing in practice. The paper [19] developed an approach for deriving diffusion 
limits for such algorithms, using ideas from numerical analysis. We can build 
on these techniques to derive scaling limits for a wide range of Metropolis- 
Hastings algorithms with local proposals. 

The purpose of this article is to develop the techniques in the context 
of the MALA algorithm. To the best of our knowledge, the only paper to 
consider the optimal scaling for the MALA algorithm for nonproduct targets 
is [9], in the context of nonlinear regression. In [9] the target measure has 
a structure similar to that of the mean field models studied in statistical 
mechanics and hence behaves asymptotically like a product measure when 
the dimension goes to infinity. Thus the diffusion limit obtained in [9] is 
finite dimensional. 

The main contribution of our work is the proof of a diffusion limit for the 
output of the MALA algorithm, suitably interpolated, to the SPDE (1.7), 
when applied to A^-dimensional approximations of the target measures (1.6) 
with proposal variance inversely proportional to N^/'^. Moreover we show 
that the speed h[l) of the limiting diffusion is maximized for an average 
acceptance probability of 0.574, just as in the i.i.d. product scenario [23]. 
Thus in this regard, our work is the first extension of the remarkable re- 
sults in [23] for the Langevin algorithm to target measures which are not of 
product form. This adds theoretical weight to the results observed in com- 
putational experiments which demonstrate the robustness of the optimality 
criteria developed in [22, 23]. In particular, the paper [7] shows numerical 
results indicating the need to scale time-step as a function of dimension to 
obtain 0(1) acceptance probabilities. 

In Section 2 we state the main theorem of the paper, having defined 
precisely the setting in which it holds. Section 3 contains the proof of the 
main theorem, postponing the proof of a number of key technical estimates 
to Section 4. In Section 5 we conclude by summarizing and providing the 
outlook for further research in this area. 

2. Main theorem. This section is devoted to stating the main theorem of 
the article. However, the setting is complex, and we develop it in a step-by- 
step fashion, before the theorem statement. In Section 2.1 we introduce the 
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form of the reference, or prior, Gaussian measure ttq, followed in Section 2.2 
by the change of measure which induces a genuinely nonproduct structure. 
In Section 2.3 we describe finite dimensional approximation of the measure, 
enabling us to define application of a variant MALA-type algorithm in Sec- 
tion 2.4. We then discuss in Section 2.5 how the choice of scaling used in 
the theorem emerges from study of the acceptance probabilities. Finally, in 
Section 2.6, we state the main theorem. 

Throughout the paper we use the following notation in order to compare 
sequences and to denote conditional expectations: 

• Two sequences {a„} and {/3n} satisfy q„ < /3„ if there exists a constant 
K > satisfying a„ < K(3n for all n > 0. The notation a„ x /3„ means 
that a„ < f3n and /3„ < a,„. 

• Two sequences of real functions {fn} and {g-n} defined on the same set D 
satisfy fn < gn if there exists a constant K > satisfying fn{x) < Kgn{x) 
for all n > and all x G -D. The notation /„ x g^ means that /„ < gn and 

9n ^ fn- 

• The notation E2;[/(x,^)] denotes expectation with respect to ^ with the 
variable x fixed. 

2.1. Gaussian reference measure. Let % he a. separable Hilbert space 
of real valued functions with scalar product denoted by (•, •) and associated 
norm = {x,x). Consider a Gaussian probability measure ttq on {'H, \\ ■ ||) 
with covariance operator C. The general theory of Gaussian measures [12] 
ensures that the operator C is positive and trace class. Let Wj, ^'j}j>i be 
the eigenfunctions and eigenvalues of the covariance operator C: 

Cipj = Xjipj, j>l. 

We assume a normalization under which the family {fj}j>i forms a com- 
plete orthonormal basis in the Hilbert space H, which we refer to us as the 
Karhunen-Loeve basis. Any function x gT-L can be represented in this basis 
via the expansion 

oo 

(2.1) x = ^Xjipj, Xj = {x,ipj). 

J=l 

Throughout this paper we will often identify the function x with its co- 
ordinates {xj}JLi G i'^ in this eigenbasis, moving freely between the two 
representations. The Karhunen-Loeve expansion (see [12], Section White 
noise expansions), refers to the fact that a realization x from the Gaussian 
measure ttq can be expressed by allowing the coordinates {xj}j>i in (2.1) 
to be independent random variables distributed as Xj ~ N(0, A|). Thus, in 
the coordinates {xj}j>i, the Gaussian reference measure ttq has a product 
structure. 
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For every x G "H we have representation (2.1). Using this expansion, we 
define Sobolev-hke spaces T-L^,r EM, with the inner-products and norms 
defined by 



oo oo 



(2.2) (X,,). '='Y.f^x,y„ ||x||? "^^f^x]. 

i=i i=i 

Notice that = % imdW d for any r > 0. The Hilbert-Schmidt 
norm || • ||c associated to the covariance operator C is defined as 

II l|2 _ \-2 2 
j 

For x,y £ V.^ , the outer product operator in Ti^ is the operator x ®y,r 

y : W — W defined by (x (E)-^!- y)z '= {y, z)rX for every z € T-C . For r G R, 
let Br :% ^ % denote the operator which is diagonal in the basis {^j}j>i 
with diagonal entries j'^'^ . The operator B^. satisfies B^i^j = j'^^ipj so that 

1/2 

Br = j^ipj. The operator Br lets us alternate between the Hilbert space 

H and the Sobolev spaces W via the identities {x,y)r = {Bl^'^x^B^^'^y) . 

Since \\Br (/j^H^ = ||vfc|| = !> we deduce that {Br 'Pk}k>G forms an or- 
thonormal basis for T-T . For a positive, self-adjoint operator D -.Ti t-^ T-L, we 
define its trace in T-L"^ by 

oo 

(2.3) Tvnr{D)''^'Y.({B-'^''^,),D{B;'/'^,))r. 

i=i 

Since Tr-^r(D) does not depend on the orthonormal basis, the operator D 
is said to be trace class in Ti^ if Tr-^r(D) < oo for some, and hence any, 

orthonormal basis of . Let us define the operator Cr '= bI^'^CbI^'^ . Notice 
that Tr-^r(Cr) = Y^^=i "^jj"^^ ■ -'■'^ shown that under the condition 

(2.4) Tr«,(C,)<oo, 

the support of ttq is included in T-T in the sense that vro-almost every function 
X €T-L belongs to Ti^ . Furthermore, the induced distribution of vro on T-L'^' is 
identical to that of a centered Gaussian measure on T-L^ with covariance 
operator Cr- For example, if ^ ~ ttq, then K[{£^,u)r{C,v)r] = {u,Crv)r for any 
functions u,v £ T-T . Thus in what follows, we alternate between the Gaussian 
measures N(0,C) on % and N(0,C,.) on T-T , for those r for which (2.4) holds. 

2.2. Change of measure. Our goal is to sample from a measure vr defined 
through the change of probability formula (1.6). As described in Section 2.1, 
the condition Tr-^r(Cr.) < oo implies that the measure ttq has full support on 
W , that is, -noiW') = 1. Consequently, if Tr-^r(Cr) < oo, the functional "^{■) 
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needs only to be defined on l-T in order for tlie cliange of probability formula 
(1.6) to be valid. In this section, we give assumptions on the decay of the 
eigenvalues of the covariance operator C of ttq that ensure the existence of 
a real number s > such that ttq has full support on T-L^ . The functional 
^(•) is assumed to be defined on T-L'^ , and we impose regularity assumptions 
on ^{■) that ensure that the probability distribution vr is not too different 
from ttq, when projected into directions associated with ipj for j large. For 
each X G W the derivative V'I'(x) is an element of the dual {H^)* of W , 
comprising linear functionals on H.'^ . However, we may identify (T-L^)* with 
T-L~^ and view V^'(x) as an element of T-L~^ for each x G "H**. With this 
identification, the following identity holds: 

l|V^'(x)||^(^.,K) = ||V'I'(x)||_„ 

and the second derivative can be identified as an element of £(7^*, "H"*) 

To avoid technicalities we assume that ^{■) is quadratically bounded, with 
the first derivative linearly bounded, and the second derivative globally 
bounded. Weaker assumptions could be dealt with by use of stopping time 
arguments. 

Assumption 2.1. The covariance operator C and functional ^ satisfy 
the following: 

(1) Decay of Eigenvalues A| of C: there is an exponent k> ^ such that 

(2.5) A,>cr". 

(2) Assumptions on ^: There exist constants Mj S M, i < 4, and s £ [0, k — 
1/2) such that for all x € the functional ^ -.T-L^ — M satisfies 

(2.6) Ml <^(3;) <M2(l + ||x||^), 

(2.7) ||V1'(x)||_,<M3(l + ||x|y, 

(2.8) ||52^(x)||^(^.,^-.)<M4. 

Remark 2.2. The condition \ ensures that the covariance operator 
C is trace class in %. In fact, equation (2.4) shows that Cr is trace-class 
in "H^ for any r < k — ^. It follows that ttq has full measure in T^*" for any 
r S [0, «; — 1/2). In particular ttq has full support on T-L^ . 

Remark 2.3. The functional ^{x) = satisfies Assumption 2.1. 

It is defined on T-L'^ and its derivative at x € T-L'^ is given by V^'(x) = 
J2j>o3'^'^^j^j ^ with ||V'I'(x)||_s = The second derivative d^"^{x) S 
C{H^ ,1-1"^) is the linear operator that maps u G to ^jx)]"^^ {u,ipj)ipj G 
Ti^: its norm satisfies ||f?^^'(ic)||£(^s^-^-s) = 1 for any x G W . 
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Since the eigenvalues A| of C decrease as Xj x j , the operator C has a 
smoothing effect: C"h gains 2aK orders of regularity in the sense that the 

-norm of C"/i is controlled by the Ti^ ^"'^-norm of /i S "H. Indeed, under 
Assumption 2.1, the following estimates holds: 

(2.9) \\h\\c^\\h\\, and ||C"/i||^ x ||/i||^_2„.. 

The proof follows the methodology used to prove Lemma 3.3 of [19]. The 
reader is referred to this text for more details. 



2.3. Finite dimensional approximation. We are interested in finite di- 
mensional approximations of the probability distribution vr. To this end, 
we introduce the vector space spanned by the first N eigenfunctions of the 
covariance operator, 

=^ sp&n{ipi,ip2, ...,ipN}- 

Notice that C Ti.^ for any r € [0; +oo). In particular, X^ is a subspace 
of 'H'^ . Next, we define A^-dimensional approximations of the functional ^{■) 
and of the reference measure ttq. To this end, we introduce the orthogonal 
projection on X^ denoted by P'^ -.W ^ C W. The functional *(•) is 
approximated by the functional : X^ i— t- M defined by 

(2.10) ^^ll^v^oP^. 

The approximation vr^ of the reference measure ttq is the Gaussian measure 
on X^ given by the law of the random variable 

N 

where £,j are i.i.d. standard Gaussian random variables, = Tl!j=i ijVj 

= oC o P^ . Consequently we have vr,^ = N(0,C^). Finally, one can 
define the approximation vr^ of vr by the change of probability formula 

(2.11) __(3;) = M^ivexp(-*^(x)), 

where M^jv is a normalization constant. Notice that the probability distri- 
bution is supported on X^ and has Lebesgue density^ on X^ equal 
to 

(2.12) 7r^{x)(xexp{-^\\x\\lN - ^'^(x)). 



'''For ease of notation we do not distinguish between a measure and its density, nor do 
we distinguish between the representation of the measure in or in coordinates in R^. 
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In formula (2.12), the Hilbert-Schmidt norm || • W^^n on X is given by the 
scalar product {u,v)qn = {u,{C^)^^v) for all u,v ^ . The operator 
is invertible on X^ because the eigenvalues of C are assumed to be strictly 
positive. The quantity C^Vlog7r^(a;) is repeatedly used in the text and, in 
particular, appears in the function fi^{x) given by 

(2.13) ^i^{x) = -{P^x + C^V^^{x)) 

which, up to an additive constant, is C^Vlog7r'^(x). This function is the 
drift of an ergodic Langevin diffusion that leaves vr'^ invariants. Similarly, 
one defines the function fj, : — )• given by 

(2.14) ^i{x) = -{x + CV^{x)) 

which can informally be seen as CVlog7r(a;), up to an additive constant. In 
the sequel. Lemma 4.1 shows that, for vro-almost every function x £71, we 
have limjv_>.oo ^^(a;) = /^(a;)- This quantifies the manner in which /i^(-) is 
an approximation of /i(-). 

The next lemma gathers various regularity estimates on the functional 
\I'(-) and \I''^(-) that are repeatedly used in the sequel. These are simple 
consequences of Assumption 2.1, and proofs can be found in [19]. 

Lemma 2.4 (Properties of Let the functional ^{■) satisfy Assump- 
tion 2.1 and consider the functional [■) defined by equation (2.10). The 
following estimates hold: 

(1) The functionals :'H^ — ?• M satisfy the same conditions imposed on 
^ given by equations (2.6), (2.7) and (2.8) with constants that can be chosen 
independent of N . 

(2) The function CV^' -.Ti.^ — )• T-L'^ is globally Lipschitz on H.^ : there exists 
a constant > such that 

\\CV^{x) - CV^{y)\\s < M^Wx - y\\s yx,y £ W . 

Moreover, the functions C^V"^^ : Ti^ — )■ H"^ also satisfy this estimate with a 
constant that can be chosen independently of N . 

(3) The functional ^'(•) : — s- M satisfies a second order Taylor formula.^ 
There exists a constant Mq > such that 

(2.15) ^{y)- {^{x) + {V^{x),y-x))<Me\\x-yf^ yx,y£W. 

Moreover, the functionals [■) also satisfy this estimates with a constant 
that can be chosen independently of N . 

Remark 2.5. Regularity Lemma 2.4 shows, in particular, that the func- 
tion jji-.T-L'^ -^Ti'^ defined by (2.14) is globally Lipschitz on "H*. Similarly, it 
follows that C^V^^-.n' W and -.W ^ W given by (2.13) are glob- 
ally Lipschitz with Lipschitz constants that can be chosen uniformly in N . 



®We extend (■, ■) from an inner-product on T-L to the dual pairing between T-i and T-L" . 
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2.4. The algorithm. The MALA algorithm is defined in this section. This 
method is motivated by the fact that the probability measure vr^ defined by 
equation (2.11) is invariant with respect to the Langevin diffusion process 

(2.16) _ = ^'V(,) + V2— , 

where is a Brownian motion in % with covariance operator . The drift 
function ji^ — )• %^ is the gradient of the log-density of vr^, as described 
by equation (2.13). The idea of the MALA algorithm is to make a proposal 
based on Euler-Maruyama discretization of the diffusion (2.16). To this end 
we consider, from state x G , proposals y G given by 

(2.17) y-x = 5^^{x) + V25{C^f''^^^ where 5 = ^iV^^/^ 

with = and 6 ~ N(0,1). Notice that {C^)^^^^ Z N(0,C^). 

The quantity 6 is the time-step in an Euler-Maruyama discretization of 
(2.16). We introduce a related parameter 

At := r^<5 = iV"^/^ 

which will be the natural time-step for the limiting diffusion process derived 
from the proposal above, after inclusion of an accept-reject mechanism. The 
scaling of At, and hence 5, with N will ensure that the average acceptance 
probability is of order 1 as grows. This is discussed in more detail in 
Section 2.5. The quantity ^ > is a fixed parameter which can be chosen to 
maximize the speed of the limiting diffusion process; see the discussion in 
the Introduction and after the Main Theorem below. 

We will study the Markov chain x^ = {x^'^^k>o resulting from Metropoliz- 
ing this proposal when it is started at stationarity: the initial position xO'^ is 
distributed as vr^ and thus lies in X^ . Therefore, the Markov chain evolves 
in X^ \ as a consequence, only the first components of an expansion in the 
eigenbasis of C are nonzero, and the algorithm can be implemented in M^. 
However the analysis is cleaner when written in X^ C "H*. The acceptance 
probability only depends on the first A^ coordinates of x and y and has the 
form 

where the proposal y is given by equation (2.17). The function T^(-,-) is 
the density of the Langevin proposals (2.17) and is given by 

T^{x,y) (X expj -^Wv -x- 6h^{x)\\In 
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(2.20) 



The local mean acceptance probability (x) is defined by 
(2.19) a^(x)=E,[a^(x,e^)]. 

It is the expected acceptance probability when the algorithm stands at x £ 
Ti. The Markov chain = {x^'^}k>o can also be expressed as 

j.k+l,N ^ ^k,Nyk,N -I- (1 _ 'y^'^)x^^^ ^ 

where ^'^'^ are i.i.d. samples distributed as and 7^^'^ = 7^(2;'^'^,^'^'^) 
creates a Bernoulli random sequence with kf'^ success probability {x^'^ , 
^^'^). We may view the Bernoulli random variable as 7'^'^ = '\~sjjk^^N (^^k,N ^^k,N-^-^ 

where ~ Uniform(0, 1) is independent from x^'^ and ^^'^ . The quantity 
defined in equation (2.18) may be expressed as 

g^(x,e^) = -\{\\y\\l^ - \\x\\l^) - (v&^(y) - ^^(x)) 
(2.21) ^ 

45 

As will be seen in the next section, a key idea behind our diffusion limit is 
that, for large A'^, the quantity {x,^^) behaves like a Gaussian random 
variable independent from the current position x. 

In summary, the Markov chain that we have described in T-L^ is, when pro- 
jected onto , equivalent to a standard MALA algorithm on for the 
Lebesgue density (2.12). Recall that the target measure vr in (1.6) is the in- 
variant measure of the SPDE (1.7). Our goal is to obtain an invariance princi- 
ple for the continuous interpolant (1.4) of the Markov chain x^ = {x^'^}f^>Q 
started in stationarity, that is, to show weak convergence in C([0,T];^*) of 

{t) to the solution z{t) of the SPDE (1.7), as the dimension N — t- 00. 



-{||x - y - 5^i^{y)\\l^ -\\y-x- 5^i^{x)\\l^}. 



2.5. Optimal scale 7 = |. In this section, we informally describe why 
the optimal scale for the MALA proposals (2.17) is given by the exponent 
7 = |. For product-form target probability described by equation (1.1), the 
optimality of the exponent 7 = ^ was first obtained in [23]. For further 
discussion, see also [6]. To keep the exposition simple in this explanatory 
subsection, we focus on the case *&(•) = 0. The analysis is similar with a 
nonvanishing function ^'(•), because absolute continuity ensures that the 
effect of ^(•) is small compared to the dominant Gaussian effects described 
here. Inclusion of nonvanishing ^{■) is carried out in Lemma 4.4. 

In the case ^{■) = 0, straightforward algebra shows that the acceptance 
probability {x,^^) = 1 A e*^^*^^'^^) satisfies 

Q''{x,e) = -^-^{\\y\\l.-\\x\\l.). 
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For = and x G , the proposal y is distributed as y = (1 — £At)x + 
V2iAt{C^y/^(^. It follows that 

l|y|lc^ - MIn = -2iAt{\\x\\lr, - ||(C^)'/'C'^||^iv) + {lAtfWxWlr, 



+ 2V2lAt{l - At){x, (C^)V2^Af. 



CN. 



The details can be found in the proof of Lemma 4.4. Since the Markov chain 
x^ = {x^'^}k>o evolves in stationarity, for all A; > 0, we have x^'^ ~ vr^ = 
N(0,C^). Therefore, with x S N(0,C^) and S N(0,C^), the law of large 
numbers shows that both ||x||^jv and || (C^)^''^^^||^jv are of order 0{N), 
while the central limit theorem shows that {x, {C^)^/^C^)cN = 0(iVi/2) and 
MIn - ||(C^)^/2^^||^jv = 0(iV^/2). For At = £iV-T and 7< i, it follows 



which shows that the acceptance probability is exponentially small of or^ 
ricv oT^Tii'_^jY-^~^'^). The same argument shows that for 7 > |, we hav( 



Q^{x,^^) —7- 0, which shows that the average acceptance probability con- 
verges to 1. For the critical exponent 7 = |, the acceptance probability is 
of order 0(1). In fact Lemma 4.4 shows that for 7 = |, even when ^(•) is 
nonzero, the following Gaussian approximation holds: 

This approximation is key to derivation of the diffusion limit. In summary, 
choosing 7 > ^ leads to exponentially small acceptance probabilities: almost 
all the proposals are rejected so that the expected squared jumping distance 
Ej^jv [ll^'^"'"^''^ — x'^'^lp] converges exponentially quickly to as the dimension 
N goes to infinity. On the other hand, for any exponent 7 > 5, the accep- 
tance probabilities are bounded away from zero: the Markov chain moves 
with jumps of size 0{N~"'^'^), and the expected squared jumping distance 
is of order 0{N~'^). If we adopt the expected squared jumping distance as 
measure of efficiency, the optimal exponent is thus given by 7 = |. This 
viewpoint is analyzed further in [6]. 

2.6. Statement of main theorem. The main result of this article describes 
the behavior of the MALA algorithm for the optimal scale 7 = |; the pro- 
posal variance is given by 5 = 2£Af-i/3. In this case, Lemma 4.4 shows that 
the local mean acceptance probability a^{x,£,^) = 1 A e*^ ^^'^ -* satisfies 

Q^{x,^^) — )• ~ N(— ^, y). As a consequence, the asymptotic mean ac- 
ceptance probability of the MALA algorithm can be explicitly computed as 
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a function of the parameter i> 0, 

a{e)= lim E"'^[a^(x,e^)] =E[1 Ae^^l. 

N^oo 

This result is rigorously proved as Corollary 4.6. We then define the "speed 
function" 

(2.22) h{£)=ia{i). 

Note that the time step made in the proposal is 5 = lAt and that if this 
is accepted a fraction a(i) of the time, then a naive argument invoking 
independence shows that the effective time-step is reduced to h{l)At. This 
is made rigorous in Theorem 2.6 which shows that the quantity h{i) is the 
asymptotic speed function of the limiting diffusion obtained by rescaling the 
Metropolis-Hastings Markov chain = {x^'^}k>o- 

Theorem 2.6 (Main theorem). Let the reference measure ttq and the 
function ^(•) satisfy Assumption 2.1. Consider the MALA algorithm (2.20) 

with initial condition x^'^ ~7r^. Let {t) be the piecewise linear, continu- 
ous interpolant of the MALA algorithm as defined in (1.4), with At = N~^^^. 
Then z^{t) converges weakly in C([0,T],'H*) to the diffusion process z{t) 
given by 

(2.23) ^ = -h{e){z + CV^{z)) + ,/2h{{}^ 
with initial distribution z{0) ~ vr. 

We now explain the following two important implications of this result: 

• Since time has to be accelerated by a factor {At)~^ = N'^^'^ in order to 
observe a diffusion limit, it follows that in stationarity the work required 
to explore the invariant measure scales as ©(iVVS), 

• The speed at which the invariant measure is explored, again in stationar- 
ity, is maximized by choosing i so as to maximize h{i); this is achieved at 
an average acceptance probability 0.574. From a practical point of view, 
this shows that one should "tune" the proposal variance of the MALA 
algorithm so as to have a mean acceptance probability of 0.574. 

The first implication follows from (1.4) since this shows that 0{N^^^) steps 
of the MALA Markov chain (2.20) are required for z^{t) to approximate 
z{t) on a time interval [0,r] long enough for z{t) to have explored its in- 
variant measure. To understand the second implication, note that if Z(t) 
solves (2.23) with h{i) = 1, then, in law, z(t) = Z(h{£)t). This result sug- 
gests choosing the value of i that maximizes the speed function h{-) since 
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Acceptance probability 

0.8| 1 1 1 




Acceptance Probability 
Fig. 1. Optimal acceptance probability — 0.574:. 

z{t) will then explore the invariant measure as fast as possible. For practi- 
tioners, who often tune algorithms according to the acceptance probability, it 
is relevant to express the maximization principle in terms of the asymptotic 
mean acceptance probability a{i). Figure 1 shows that the speed function 
h{-) is maximized for an optimal acceptance probability of q* = 0.574, to 
three-decimal places. This is precisely the argument used in [23] for the case 
of product target measures, and it is remarkable that the optimal acceptance 
probability identified in that context is also optimal for the nonproduct mea- 
sures studied in this paper. 

3. Proof of main theorem. In Section 3.1 we outline the proof strat- 
egy and introduce the drift-martingale decomposition of our discrete-time 
Markov chain which underlies it. Section 3.2 contains statement and proof of 
a general diffusion approximation. Proposition 3.1. In Section 3.3 we use this 
proposition to prove the main theorem of this paper, pointing to Section 4 
for the key estimates required. 

3.1. Proof strategy. To communicate the main ideas, we give a heuristic 
of the proof before proceeding to give full details in subsequent sections. 
Let us first examine a simpler situation: consider a scalar Lipschitz function 
^ : R — )• M and two scalar constants i,c> 0. The usual theory of diffusion ap- 
proximation for Markov processes [14] shows that the sequence = {x'^'^} 
of Markov chains 
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with i.i.d. C'' ~ N(0, 1) converges weakly, when interpolated using a time- 
acceleration factor of N^^^ , to the scalar diffusion dz{t) = ifi{z{t)) dt + 
^/2£dW{t) where is a Brownian motion with variance Var(M^(t)) = ct. 
Also, if is an i.i.d. sequence of Bernoulli random variables with success 
rate a{£), independent from the Markov chain x^, one can prove that the 
sequence = {x^'^} of Markov chains given by 

^k+i,N _ ^k,N ^ ^fc{^(^fc,^)£Ar-i/3 + V2£iV-i/3ci/2^fc} 

converges weakly, when interpolated using a time-acceleration factor N^^^, 
to the diffusion 

dz{t) = h{£)fi{z{t)) dt + ^/2h(/)dW{t), 

where the speed function is given by h{i) = £a{i). This shows that the 
Bernoulli random variables {7'^}fc>o have slowed down the original Markov 
chain by a factor a{i). The proof of Theorem 2.6 is an application of this 
idea in a slightly more general setting. The following complications arise: 

• Instead of working with scalar diffusions, the result holds for a Hilbert 
space- valued diffusion. The correlation structure between the different co- 
ordinates is not present in the preceding simple example and has to be 
taken into account. 

• Instead of working with a single drift function n, a sequence of approxi- 
mations converging to fi has to be taken into account. 

• The Bernoulli random variables 7^^'^ are not i.i.d. and have an autocorre- 
lation structure. On top of that, the Bernoulli random variables 7*^'^ are 
not independent from the Markov chain x'^'^. This is the main difficulty 
in the proof. 

• It should be emphasized that the main theorem uses the fact that the 
MALA Markov chain is started at stationarity; this, in particular, implies 

that x^'^ ~ TT^ for any k>0, which is crucial to the proof of the invariance 
principle as it allows us to control the correlation between 7^^'-^ and x^'^ . 

The acceptance probability of proposal (2.17) is equal to a^(x,^^) = 
lAe*^ (a;,? and the quantity a^(x) =Ea;[a^(x,^^)], given by (2.19), rep- 
resents the mean acceptance probability when the Markov chain x^ stands 
at x. For our proof it is important to understand how the acceptance prob- 
ability a^(x,^^) depends on the current position x and on the source of 
randomness Recall the quantity defined in equation (2.21): the main 
observation is that Q^{x,^^) can be approximated by a Gaussian random 
variable 

(3.1) Q^(x,C^)«Z,, 
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where Z^~N(— ^j^)- These approximations are made rigorous in Lem- 
ma 4.4 and Lemma 4.5. Therefore, the BernouUi random variable "y^{x,^^) 
with success probabihty 1 a e*^^*-^'^^-* can be approximated by a Bernouhi 
random variable, independent of x, with success probability equal to 

(3.2) a{i)=K[lAe^']. 

Thus, the limiting acceptance probability of the MALA algorithm is as given 
in equation (3.2). 

Recall that At = A^~^/'^. With this notation we introduce the drift func- 
tion -.W ^ W given by 

(3.3) d^{x) = (/i(£)At)-^E[xi'^ - rE°'^|x°'^ = x] 

and the martingale difference array {r'^'^ : /c > 0} defined by r*^'^ = (x^'^ , 
^'='^) with 

(3.4) r'^'^ = (2/i(£)At)^V2(^fc+i,Af _ ^k,N _ /j(£)Atd^(x'='^)). 

The normalization constant h{t) defined in equation (2.22) ensures that the 
drift function and the martingale difference array {r'"''^} are asymptoti- 
cally independent from the parameter ^. The drift-martingale decomposition 
of the Markov chain {x^'^^k then reads 

(3.5) x^+i'^ - x'^'^ = /i(^)Atd^(x'='^) + V2/i(£)Atr^-'^. 

Lemma 4.7 and Lemma 4.8 exploit the Gaussian behavior of {x,^,^), 
described in equation (3.1), in order to give quantitative versions of the 
following approximations: 

(3.6) d^{x)^fi{x) and V'''^ ^N{0,C), 

where the function /i(-) is defined by equation (2.14). From equation (3.5) 
it follows that for large N the evolution of the Markov chain resembles the 
Euler discretization of the limiting diffusion (2.23). The next step consists of 
proving an invariance principle for a rescaled version of the martingale dif- 
ference array {L'^'^}. The continuous process W'^ G C([0;r],?^*) is defined 
as 

k 

(3.7) W'^it) = ^/At^P'^ + ^ for fcAt < t < (A; + l)At. 

j=o ^ 

The sequence of processes {l^^}Ar>i converges weakly as — )■ oo in C([0;T], 
T-L^) to a Brownian motion W in T-L'^ with covariance operator equal to Cg- 
Indeed, Proposition 4.10 proves the stronger result 
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where =^ denotes weak convergence in 'H'^ x C{[0;T],'H^), and z'^ ~ vr is 
independent of the hmiting Brownian motion W. Using this invariance prin- 
ciple and the fact that the noise process is additive [the diffusion coefficient 
of the SPDE (2.23) is constant], the main theorem follows from a continu- 
ous mapping argument which we now outline. For any W S C{[0,T];T-L^) we 
define the Ito map 

e -.W X Ci[0,T];n') ^ C{[0,T];n') 
which maps {z^,W) to the unique solution of the integral equation 

(3.8) z{t) = z° - h{i) [ fi{z)du + ^/2h{e)W{t) VtG[0,T]. 

Jo 

Notice that z = e{z°,W) solves the SPDE (2.23). The Ito map 9 is contin- 
uous, essentially because the noise in (2.23) is additive (does not depend on 
the state z). The piecewise constant interpolant of is defined by 

(3.9) z^{t)=x'' for kAt<t<{k + l)At. 

Using this definition it follows that the continuous piecewise linear inter- 
polant z^ , defined in equation (1.4), satisfies 

(3.10) z^{t) = x^'^ -h{£) f d^{z^{u))du + y^2h(!)W^{t) VtG[0,T]. 

Jo 

Using the closeness of d^ {■) and /i(-), and of and z^, we will see that 
there exists a process =^ Vl^ as — t- oo such that 

z^{t) = x°'^ - h{£) (\{z^{u)) du + y^2h{t)W^{t). 
Jo 

Thus we may write = Q{x^'^ ,W^). By continuity of the Ito map B, it 
follows from the continuous mapping theorem that z^ = e{x^'^,W^) =^ 
@{z^,W) = z as N goes to infinity. This weak convergence result is the 
principal result of this article. 

3.2. General diffusion approximation. In this section we state and prove 
a proposition containing a general diffusion approximation result. Using this, 
we then prove our main theorem in Section 3.3. To this end, consider a gen- 
eral sequence of Markov chains x^ = {x'''^}k>o evolving at stationarity in 
the separable Hilbert space W, and introduce the drift-martingale decom- 
position 

(3.11) x''+^'^ - x'''^ = h{e)d^{xk)At + y^2h{e)Atr'''^, 

where h{i) > is a constant parameter, and At is a time-step decreasing to 
as goes to infinity. Here d^ and r'^'^ are as defined above. We introduce 
the rescaled process W^{t) as in (3.7). The main diffusion approximation 
result is the following. 
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Proposition 3.1 (General diffusion approximation for Markov cliains). 
Consider a separable Hilhert space {7i^,{-,-)s) and a sequence of Ti'^ -valued 
Markov chains = {x^'^}k>o with invariant distribution tt^ . Suppose 

that the Markov chains start at stationarity x^'^ ~ and that the drift- 
martingale decomposition (3.11) satisfies the following assumptions: 

(1) Convergence of initial conditions: vr^ converges in distribution to the 
probability measure vr where vr has a finite first moment, that is, E'^[||x||s] < oo. 

(2) Invariance principle: the sequence {x^'^ ,W^) , defined by equation 

(3.7), converges weakly in W x C{[0,T],W) to {z^,W) where z^^tt, and 
W is a Brownian motion in W , independent from , with covariance op- 
erator Cs ■ 

(3) Convergence of the drift: There exists a globally Lipschitz function 
fj, : Ti'^ — 7- Ti^ that satisfies 

lim E-"[||d^(x)-/.(x)||,]=0. 

N~>-oo 

Then the sequence of rescaled interpolants z^ G C{[0,T],H'^), defined by 
equation (1.4), converges weakly in C([0,T],?^*) to z G C([0, T], ?^*) given 
by 

^ = h{iMz{t)) + ,/2h(f)^, 
z{0) ~ vr. 

Here W is a Brownian motion in T-L'^ with covariance Cg and initial condition 
z^ ^ IT independent of W . 

Proof. Define z^ {t) as in (3.9). It then follows that 



z^{t) = x^^^ + h{i) / d^{z^{u))du + ^2h(J)W^{t) 
Jo 

(3.12) 

= z°'^ + /i(£) / fi{z^{u))du + ./2h(J)W^{t), 
Jo 

where the process £ C([0,r],-H^) is defined by equation (3.7) and 



W^{t) = W'^it) + Ad^(z^(n)) - /i(z^(n))] du. 







Define the Ito map 6:?^" x C{[0,T]-n') ^ C{[0,T];n') that maps {zo,W) 
to the unique solution z G C{[0,T],T-L^) of the integral equation 



z{t) = ZQ + h{i) n{z{u))du+ ^j2h{(.)W{t) VtG[0,T]. 
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Equation (3.12) is thus equivalent to = Q{x^'^ ,W^). The proof of the 
diffusion approximation is accomplished through the following steps: 

• The ltd map Q-.W x C{[0,T],n') C{[0,T],'H'') is continuous. This is 
Lemma 3.7 of [19]. 

• The pair {x'^'^ ,W^) converges weakly to (z'^jW). In a separable Hilbert 
space, if the sequence {an}n€^ converges weakly to a, and the sequence 
{6„,}„gN converges in probabiUty to 0, then the sequence {a„ + 6n.}n.eN 
converges weakly to a. It is assumed that {x^'^ ,W^) converges weakly to 
(z^,W) in X C{[0,T],'H'^). Consequently, to prove that W'^ converges 
weakly to W, it suffices to prove that \\d^{z^{u)) - fi{z^{u))\\sdu 
converges in probability to 0. For any time kAt < u < (k + l)At, the 
stationarity of the chain shows that 



P II II II _ 0,7V I 

A' 1 1 Lip '11''^ I 



where in the last step we have used the fact that ||2:^(u) — z^(u)||s < 
||^fc+i,Af _ ^fc,Af 11^^ Consequently, 

''^||d^(z^(n))-/.(z^(n))|Mn 




<r.E-"[||(i^(xO'^)-Mx°'^)lls] 
+ r.||/.||Lip-E-"[||xi'^-x0.^y. 

The first term goes to zero since it is assumed that lim7v]E'^'^[||(i^(a;) — 
= 0. Since Tr-^s(Cs) < oo, the second term is of order 0{^/Ai) and 
thus also converges to 0. Therefore converges weakly to W, hence 
the conclusion. ^ 
Continuous mapping argument. We have proved that {x^'^ ,W^) con- 
verges weakly in W x C([0,r],?^*) to {z^,W), and the Ito map Q-.H'x 
C([0, T],^) — )• C([0, T],^.^) is a continuous function. The continuous map- 
ping theorem thus shows that z^ = 0(x'''^,W^) converges weakly to 
z = Q{z^,W), finishing the proof of Proposition 3.1. □ 



3.3. Proof of main theorem. We now prove Theorem 2.6. The proof con- 
sists of checking that the conditions needed for Proposition 3.1 to apply are 
satisfied by the sequence of MALA Markov chains (2.20). The key estimates 
are proved later in Section 4. 
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(1) By Lemma 4.3 the sequence of probabihty measures vr^ converges 
weakly in T-L'^ to vr. 

(2) Proposition 4.10 proves that {x^'^ ,W^) converges weakly va. % x 
C([0, T], 7^'^) to {z^,W), where V7 is a Brownian motion with covariance Cs 

independent from z*^ ~ vr. 

(3) Lemma 4.7 states that d^{x), defined by equation (3.3), satisfies 
liiRN^'^'^iWd'^ix)- fi{x)\\1] = 0, and Proposition 2.4 shows that ^x■.W 

is a Lipschitz function. 

The three assumptions needed for Lemma 3.1 to apply are satisfied, which 
concludes the proof of Theorem 2.6. 

4. Key estimates. Section 4.1 contains some technical lemmas of use 
throughout. In Section 4.2 we study the large A'' Gaussian approximation of 
the acceptance probability, simultaneously establishing asymptotic indepen- 
dence of the current state of the Markov chain. This approximation is then 
used in Sections 4.3 and 4.4 to give quantitative versions of the heuristics 
(3.6). The section concludes with Section 4.5 in which we prove an invariance 
principle for given by (3.7). 

4.1. Technical lemmas. The first lemma shows that, for vro-almost every 
function x E H.'^ , the approximation {x) ~ ^jl{x) holds as goes to infinity. 

Lemma 4.1 converges vro-almost surely to Let Assumption 2.1 
hold. The sequences of functions /i^ : "H* — ?• 7^'^ satisfies 

7ro(|xG?^^:^lim ||/i^(x) - /i(x)||, = o] 

Proof. It is enough to verify that for x £ T-L^, we have 

(4.1) lim \\P^x-x\\s=0, 

Af-5>oo 

(4.2) lim IICP^V^(P^x) -CV^(x)||, =0. 

A''— >-oo 

• Let us prove equation (4.1). For x G V.^, we have X^j>ii^*2;j < oo so that 

oo 

(4.3) lim \\P^x-xf = lim V j^'x^ = 0. 

^ ^ iV^oo" AT^oo ^ ^ 

j=N+l 

• Let us prove (4.2). The triangle inequality shows that 

IICP^V^'(P^x) -CV^'(x)||, 

< IICP^V^(P^x) -CP^V^'(x)||, + \\CP^V^{x) - CV^{x)\\s. 

The same proof as Lemma 2.4 reveals that CP^V"^ :'H^ — )■ T-L^ is globally 
Lipschitz, with a Lipschitz constant that can be chosen independently 
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of N. Consequently, equation (4.3) shows that 

\\CP^V^{P^x) - CP^V^{x)\\s < \\P^x - x\\s ^ 0. 

Also, z = V^ix) G n-' so that \\V'^{x)\\'i^ = Y.j>ij~'^'Zj < ^- The 
eigenvalues of C satisfy A| x with s < k — ^. Consequently, 

\\CP^\/^{x)-CV^{x)f^ 

oo oo 

j=N+l j=N+l 

oo ^ 

j=N+l \ < J U 



The next lemma shows that the size of the jump y — x is of order v At. 

Lemma 4.2. Consider y given hy (2.17). Under Assumption 2.1, for any 
p > 1, we have 

[||y-^ll?]<(Atr/'-(i + lkll?)- 

Proof. Under Assumption 2.1 the function /x^ is globally Lipschitz on 
T^**, with Lipschitz constant that can be chosen independently of A^. Thus 

||y - < At(l + \\x\\s) + ^tWC^'^i'^Ws. 

We have w'\\\C^/'^^^fs\ < E''" [||Cl|f] < oo, where C~N(0,C). From Fer- 
nique's theorem [12], it follows that [||Cl|s] < oo. Consequently, 
]g7r0 [||(ji/2^7V||Pj jg uniformly bounded function of A^, proving the lemma. 
□ 



The normalizing constants M^,n are uniformly bounded, and we use this 
fact to obtain uniform bounds on moments of functionals in T-L under vr^. 
Moreover, we prove that the sequence of probability measures vr^ on T-L^ 
converges weakly in T-L^ to vr. 

Lemma 4.3 (Finite dimensional approximation vr^ of vr). Under As- 
sumption 2.1 the normalization constants M^n are uniformly hounded so 
that for any measurable functional / : i— )• M, we have 

E-"[|/(x)|]<E-°[|/(x)|]. 
Moreover, the sequence of probability measure tt^ satisfies 

where =^ denotes weak convergence in . 
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Proof. The first part is contained in Lemma 3.5 of [19]. Let us prove 
that We need to show that for any bounded continuous function 

g : H.^ — )• M we have limTv-s-oo E'^ [oi^)] = lE'^[(7(a:;)] where 

Since g is bounded, ^ is lower bounded, and since the normahzation con- 
stants are uniformly bounded, the dominated convergence theorem shows 
that it suffices to show that g{P^ x)M^Ne~^^^ converges vro-almost surely 
to 5f(x)Mii,e~*^^\ For this in turn it suffices to show that ^(P^x) converges 
vTo-almost surely to ^(x), as this also proves almost sure convergence of the 
normalization constants. By (2.7) we have 

|*(P^x) - *(x)| < (1 + ||x||, + ||P^x||,)||P^x - x||,. 

But limAr_^oo ll-P^x — x\\s —7- for any x G "H*, by dominated convergence, 
and the result follows. □ 

Fernique's theorem [12] states that for any exponent p > 0, we have 
E'^ [||x||§] < oo. It thus follows fr om Lemma 4.3 that for any p > 0, 

N 

sup{E^ [||x||P]:iVGN}<oo. 

JV 

This estimate is repeatedly used in the sequel. 

4.2. Gaussian approximation of . Recall the quantity defined in 
equation (2.21). This section proves that has a Gaussian behavior in the 
sense that 

(4.4) Q^(x,e^) = Z^(x,e^) + f^(x,e^) + e^(x,C^), 

where the quantities and i^ are equal to 

(4.6) ^^{x,e) = \m?mi. - wic^f'^ewi.) 

with i^ and small. Thus the principal contributions to comes from 
the random variable {x,^^). Notice that, for each fixed x G 'H'*, the 
random variable Z^ {x,^^) is Gaussian. Furthermore, the Karhunen-Loeve 
expansion of ttq shows that for vro-almost every choice of function x GTi 

the sequence {Z^ {x,^^)}n>i converges in law to the distribution of Zi^ 

^(~T't)' '^^^ next lemma rigorously bounds the error terms e^(x,^^) 



(4.5) 
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and i^{x,S^^): we show that is an error term of order 0{N~^^^) and 
e^{x,^) is an error term of order 0{N~^^^). In Lemma 4.5 we then quan- 
tify the convergence of Z^{x,^^) to Z£. 

Lemma 4.4 (Gaussian approximation). Letp> 1 be an integer. Under 
Assumption 2.1, the error terms i^ and in the Gaussian approximation 
(4.4) satisfy 

(E"^[|z^(x,C^)|P])i/P = 0(iV-i/6) and 

(4.7) 

(E'^'^[|e^(x,e^)n)i/P = 0(iV-i/3). 

Proof. For notational clarity, without loss of generality, we suppose 
p = 2q. The quantity is defined in equation (2.21), and expanding terms 
leads to 

Q^{x,^^)=h + h + l3, 
where the quantities Ji, I2 and I3 are given by 

h = -;^(lly|lc^^ - MIn) 



2 

- y(i - mfc^ - \\y - ^(1 - mfcN), 

h = -(^"^(y) - ^""{x)) -\{{x- y{l - Mt),C^V^^(y))c^ 

-(y-x(l-Mt),C^VvI/^(x))civ), 
h = -^{||C^VvI/^(,)||^. - \\C^V^>^{x)\\l.}. 



N 



The term Ii arises purely from the Gaussian part of the target measure vr 
and from the Gaussian part of the proposal. The two other terms I2 and I^, 
come from the change of probability involving the functional . We start 
by simplifying the expression for /i, and then return to estimate the terms 
I2 and Is: 

^1 = -^(l|y|lc^ - \\x\\In) 

- ^(IK^ -y) + ^^tyfcN -\\{y-x) + £Atx||2^) 

^ (2Mt[||x||2^ - WvWlr,] + (Mt)2[||2/||2^ - \\x\\lr,]) 



4£At 

^^^n\ ii2 II ||2 \ 
~~r\\\y\\c'^ ~ W^Wc^j- 



4 
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The term /i is 0(1) and constitutes the main contribution to . Before 
analyzing Ii in more detail, we show that I2 and are 0{N~'^/^). 

(4.8) (E^'^[/|^])i/(23) =0(iV-i/3) and {w"" [ll'']^/^'^'^^ = 0{N~^l^). 

• We expand I2 and use the bound on the remainder of the Taylor expansion 
of ^ described in equation (2.15), 

h = -{*^(y) - [^^(rr) + (V^^(x),y - x)]} 



1 



iAt. 



+ —{{x,V^i^'\x))-{y,V^'^y))} 
= Ai+A2 + A3. 
Equation (2.15) and Lemma 4.2 show that 

E-"[^^,j < ^.N^y _ ^||4,] < (At)2.]E-'"[i + WxWi'i] < (At)2'? = (iV-V3)2,^ 

where we have used the fact that E'^^ [HxUs"^] < E'^" [Hxlls"^] < 00. Assump- 
tion 2.1 states that d^"^ is uniformly bounded in C{T-L^ ,'H~'^) so that 



|V^(y) - 



(4.9) 



< 



< Ma 



(9^^(x + t{y - x)) -{y-x) dt 

d'^^{x + t{y-x)) ■ {y-x)\\_^ 
1 

\\y — x\\sdt. 



.dt 



This proves that \\\/'^^{y) — V^'^(x)||_s < Hj/ — Consequently, Lem- 
ma 4.2 shows that 

w''[Al^]<w''[\\y-xt^-\\V^'^{y)-V^'^{x)\\%] 

<E'^"[||y-a;||f^] 

<(At)2^E-"[l + ||x||f'?] 

Under Assumption 2.1, for any z G 'H'^ we have ||V'I'^(z)||_s ^ 1 + \\A\s- 
Therefore E'^ [^3"^] ^ (At)^"?. Putting these estimates together, 

(E^^[/2"'])l/(29) < (E^^ [^^^+ ^^2gj)l/(2,) ^0(^-1/3)^ 
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• Lemma 2.4 states C^V^^iH* -^Ti^ is globally Lipschitz, with a Lips- 
chitz constant that can be chosen uniformly in N. Therefore, 

(4.10) ||C^V^^(z)||.<l + ||.||.. 

Since ||C^V^'^(z)||2^ = (V^^(z),C^V^^(z)), bound (2.7) gives 

<At2'?E-''[(l+||x|y2'' + (l + ||y||,)2«] 

< AfV" [1 + 11x112" + < Ai^" = (iV-V3)2g^ 

which concludes the proof of equation (4.8). 

We now simplify further the expression for Ii and demonstrate that it 
has a Gaussian behavior. We use the definition of the proposal y, given in 
equation (2.17), to expand Ii. For x G we have P^x = x. Therefore, for 

x£X^, 

I, = -:^(||(1 - iAt)x - iAtC^V^^ix) + V2£Ai{C^y/^^^\\lM - \\x\\Ij,) 

= Z^(x, e^) + i'^ix, e^) + + + + B,, 
with Z^{x,^^) and i^{x,(,^) given by equation (4.5) and (4.6) and 



^1 = ^ 1 



N 

B2 = - jiV-Hl|C^V^^(x)||2^ + 2(x, V^'^(x))}, 
B, = + C^V*^(:r), {C^f''i^)c^. 

Bi = -N-'^/^{x,V^^{x)). 

The quantity is the leading term. For each fixed value of x G T-L^ , the 
term Z^(x,^^) is Gaussian. Below, we prove that quantity is 0{N~^/^). 
We now establish that each Bj is 0{N-^/^), 



(4.11) (E^^[R2^])l/(2g) ^0(^-1/3) j = i,...,4. 



|2 



Lemma 4.3 shows that E'^ [(1 - —^f"] < E'^o[(l - Under ttq, 



N N 

where pi,. . . ,piy are i.i.d. N(0, 1) Gaussian random variables. Consequently, 
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• The term ||C^V^^(x)||^'^ has akeady been bounded while proving 
^"'^[^3''] ^ (iV~^/^)^^. Equation (2.7) gives the bound ||V^'^(x)||_, < 
1 + ||x||s and shows that E'^ [(x, V^^(x))^''] is uniformly bounded as 
a function of A^. Consequently, 

• We have (C^Vf ^(x), (C^)^/^^^)^^ = (V^^(x), (C^^/^^^) so that 

<1- 

By Lemma 4.3, one can suppose x ~ ttq, 

N 

where pi,. . . , are i.i.d. N(0, 1) Gaussian random variables. Consequently 
(E^'^[(x, (C^)i/2^^)29^])i/(2g) = 0(iVi/2), which proves that 

(E-'^[532''])i/(2'?) = o(Ar-5/6+i/2) ^ 0(iV"i/3). 

• The bound ||V1'^(x)||_s<l + ||x||^ ensures that (E'^'^ [bI"] ) V{2<7) = Q (7V-2/3) . 

Define the quantity e^(x, = /2 + ^3 + + -B2 + -B3 + B4 so that 
can also be expressed as 

Q^(x,e^) = Z^(x,e^) + ^^{x,C^) + e^(x,^^). 

Equations (4.8) and (4.11) show that satisfies 

(E-~[e^(x,e^)29])i/(2'?) = 0(Ar-i/3). 

We now prove that is ©(iY-^/^). By Lemma 4.3, E'^'^ [f^(x, C^)^'?] < 
E'^o[i^(x,C^)29]. If x^TTo, we have 

z^(x,e^) = ^iv-^/nikii^. - ii(c^)^/^e^ii^.} 



where pi,...,pN are i.i.d. N(0, 1) Gaussian random variables. Since 
n{Ef=i{p] - Cj)?'] < N", it follows that 

(4.12) (E^''[i^(x,e^)2'^])l/(2'?) =0(Ar-2/3+l/2) ^0(^-1/6)^ 

which ends the proof of Lemma 4.4. □ 
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The next lemma quantifies the fact that Z^{x,£,^) is asymptotically in- 
dependent from the current position x. 

Lemma 4.5 (Asymptotic independence). Let p>l be a positive inte- 
ger and /:]R— T-R be a 1 - Lip schitz function. Consider error terms {x,S,) 
satisfying 

hm E-"[ef(x,C^n = 0. 

N^oo 

Define the functions f^ : M — t- M and the constant f ^M. by 

/^(x)=E,[/(Z^(x,e^) + ef(x,C^))] and / = E[/(Z,)]. 

Then the function f^ is highly concentrated around its mean in the sense 
that 

hm E-"[|/^(x)-/n=0. 

A'^-^oo 

Proof. Let / be a 1-Lipschitz function. Define the function x 
[0;oo) by 

F{fi, a) = E[f{p^^^)] where p/,,^ S N(/i, cr^). 
The function F satisfies 

(4.13) \F{fii,ai) -F(^2, 0-2)1 < |/U2 - + W2-cri\, 

for any choice fii,fi2^^ and cri,cr2 > 0. Indeed, 

\F{ni,ai) - F(^2, 0-2)1 = |E[/(^i + cri/>o,l) - /(M2 + o-2Po,i)]| 

< E[|^f2 - ^fll + |o-2 - fill • |po,l|] 

^ \fJ-2 - /Wll + |o-2 - 0-i|. 

We have E,x[Z^ (x,^^)] = 'E[Zi] = — ^ while the variances are given by 

Var[Z^(x,C^)] = ^^ and Var[Z,] = ^. 
Therefore, using Lemma 4.3, 

E-"[|/^(x)-/n 

= E-" [|E. [/(z^(x, e^) + ef (x, e^)) - f{z,)] n 

< E-" [|E, [f{Z^{x, e^)) - fiZe)] n + E-" [|ef (x, C^) |P] 
= E'^" 



F{ -^,Var[Z^(x,e^)]V2) -F(-^,Var[Z,]V2 
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+ E'^"[|ef(x,e^)n 

E""^ [| Var[Z^(x, C^)]^/^ _ Yai[Ze\^l'^\^] + E""^ [|ef (x, 



<]E7ro 



|2 1/2 



1 



■E'^"[|ef(x,e^)n^O. 



In the last step we have used the fact that if x ~ ttq, then S ^^"'"^"^'^^ 

where pi,...,pN are i.i.d. Gaussian random variables N(0, 1) so that 

]E^o|{^|c^}i/2 _ ijp^O. □ 

Corollary 4.6. ie< p > 1 6e a positive. The local mean acceptance 
probability a^{x), defined in equation (2.19), satisfies 

lim E"'^[|a^(x) -a(£)n =0. 

Proof. The function f{z) = 1 A is 1-Lipschitz and a{t) = 'E[f{Z()]. 
Also, 

a^(x) =E,.[/(Q^(x,e^))] =E,.[/(Z^(x,e^) + ef 

with e^(x,^^) = i^{x,^^) + e^(x,^^). Lemma 4.4 shows that 
limjv_^oo lE'^ [e^{x,^)P] = 0, and therefore Lemma 4.5 gives the conclusion. 
□ 

4.3. Drift approximation. This section proves that the approximate drift 
function d'^ :'H^ — )• Ti^ defined in equation (3.3) converges to the drift func- 
tion p-.T-L^ -^H'^ of the limiting diffusion (2.23). 

Lemma 4.7 (Drift approximation). Let Assumption 2.1 hold. The drift 
function d^ : H'^ — t- T-L^ converges to p in the sense that 

hm E'^"[||d^(x)-Mx)||?]=0. 

Proof. The approximate drift d^ is given by equation (3.3). The def- 
inition of the local mean acceptance probability a^(x), given by equation 
(2.19), show that d^ can also be expressed as 

d^{x) = {a^{x)a{i)-^)fi^{x) + V2£h{e)-'^{At)-^/^e^{x), 
where ^^(x) = —{P^x + C^'V'i!^{x)), and the term e^{x) is defined by 



{x) = E.,[7^(x,^^)CV2^^] = E.[(l A e«"(-'«"))CV2e^]. 
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To prove Lemma 4.7, it suffices to verify that 

(4.14) lim E-"[||(a^(x)a(^)-i)/i^(x)-^(x)||^]=0, 

N^oo 

(4.15) lim (At)-iE-"[||e^(x)||2]=0. 

N^oo 

• Let us first prove equation (4.14). The triangle inequality and the Cauchy- 
Schwarz inequality show that 

(E-"[||(a^(x)a(£)-i)^.^(x)-/i(x)||2])^ 

<E[|a^(x)-a(£)r]-E'^"[||^^(x)||f] 

+ E'^"[||^^(x)-Mx)||f]. 

By Remark 2.5, — )• 'H'^ is Lipschitz, with a Lipschitz constant that 
can be chosen independent of A^. It follows that sup^ E'^ [||m^(^)IIs] < c«. 
Lemma 4.5 and Corollary 4.6 show that E[|a^(x) — a(^)|^] — 0. Therefore, 

hm E[|a^(x) - a{et] • E-"[||/x^(x)||l] = 0. 

The functions fi^ and /i are globally Lipschitz on T-L^, with a Lipschitz con- 
stant that can be chosen independently of N, so that \\^^{x) — fi{x)\\g < 
(1 + ||a;||^). Lemma 4.1 proves that the sequence of functions {jJ-^} con- 
verges TTQ-almost surely to /i(x) in and Lemma 4.3 shows that 
E^'^[||^^(x) - /u(2;)||f] < E^«[||/i^(x) - /x(2;)||^]. It thus follows from the 
dominated convergence theorem that 

hm E'^"[||/.^(x)-/.(x)||f]=0. 

This concludes the proof of equation (4.14). 

• Let us prove equation (4.15). If the Bernoulli random variable 7^(2;,.^^) 
were independent from the noise term (C^)^/^^^^ would follow that 
£^{x) = 0. In general 7^(x, ^^) is not independent from (C^)^/^^^ so that 
£^ (x) is not equal to zero. Nevertheless, as quantified by Lemma 4.5, the 
Bernoulli random variable 7^(a^,C^) is asymptotically independent from 
the current position x and from the noise term (C^)^^"^^^ . Consequently, 
we can prove in equation (4.17) that the quantity (x) is small. To this 
end, we establish that each component {e{x),(pj)1 satisfies 

(4.16) E-"[(e^(x),^,-)2]<iV"iE-"[(x,^,)2] + Ar-2/3(j-.A^.)2. 

Summation of equation (4.16) over j = 1, . . . , leads to 

(4.17) E-"[||e^(rE)||2]<Ar-iE'^"[||x||2] + iV-2/3^^,(C^)<^-2/3^ 
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which gives the proof of equation (4.15). To prove equation (4.16) for a 
fixed index j G N, the quantity Q^{x, ^) is decomposed as a sum of a term, 
independently from ^j, and another remaining term of small magnitude. 
To this end we introduce 



(4.18) 



( Q^(x,e^) = Qf (x,e^) + Qfjx,c''), 

The definitions of Z^[x,^^) and z^(x,^^) in equations (4.5) and (4.6) 
readily show that Q^j_{x,^^) is independent from S^j. The noise term sat- 
isfies C^/'^^^ = 'Yl!j=i{j^ ■ Since (5^j_(x, i'^), and are independent 
and 2 I— )• 1 A is 1-Lipschitz, it follows that 



(.^(x), = (j^A,)'(E.[(l A e«"(-'«"))e,]) 



2 



(/A,)^(E.[[(1 A e«'^(-«'^)) - (1 A e^f^^^^'^^'^mf 



< {f\,f¥.,[\Q^{x,e) - QiA^,e)?] 

= {f\,f¥.,[Qf{x,e?]- 
By Lemma 4.4 w"" [e^ {x , f] < N~^/'^. Therefore, 
(j-A,)^E-"[Qf(x,e^)2] 

< (/A,f {AT-Uj^E-^ix^^^l] + Ar-^/3E'^"[A,^e^] +E-"[e^(:r,0^]} 

< N~'E-''[{fx,fc]] + {fX,)\N-'/' + iV-2/3) 
<iV-iE-~[(x,^,)2] + (/A,fiV-2/3 
<iV-iE-"[(x,^,)2] + (/A,fiV-2/3, 

which finishes the proof of equation (4.16). □ 

4.4. Noise approximation. Recall definition (3.4) of the martingale dif- 
ference r'^'^. In this section we estimate the error in the approximation 
Yk,N ^ N(0,Cs). To this end we introduce the covariance operator 

D^'ix) = E^ir'^'^ 0w r'^'^^ix'^'^ = x]. 

For any x,u,v Ti^, the operator D^{x) satisfies 

E[{r'''^,u)s{r''''',v),\x'''^ = x] = {u,D''{x)v)s. 
The next lemma gives a quantitative version of the approximation D^{x) ~ Cs- 
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Lemma 4.8. Let Assumption 2.1 hold. For any pair of indices i,j > 0, 
the operator (x) : Ti^ — )■ satisfies 

(4.19) lim E''''\{^i,D^{x)0J)s-{^^,Cs^J)s\=O 
and, furthermore, 

(4.20) lim E^^^l Tr^.(L»^(x)) - Tr^.(Cs)| = 0. 
Proof. The martingale difference T^{x,^) is given by 

r^(x,e) = a(^)-^/S^(x,0c^/2C 

(4.21) 

+ -^a{£)-y\£Aty/^j^{x,Of^''(.x)-a{i)d^{x)}. 

We only prove equation (4.20); the proof of equation (4.19) is essentially 
identical, but easier. Remark 2.5 shows that the functions fi,fi^ -.T-L^ -^T-L^ 
are globally Lipschitz, and Lemma 4.7 shows that E'^ [||d^(a;) — At(a^)||s] — ?• 0. 
Therefore 

(4.22) E-"[||7^(x,0/^^(x) - a(^)d^(x)||2] < 1, 

which implies that the second term on the right-hand side of equation (4.21) 
is 0{^/Ai). Since Tr^.(L»^(x)) = E^[||r^(x,C)||^], equation (4.22) implies 
that 

E'^"[|«(^)Tr^.(Z^^(x)) -E.[||7^(x,e)Ci/'ell?]|] < (At)^/^. 
Consequently, to prove equation (4.20), it suffices to verify that 

(4.23) hm E'^"[|E.[||7^(:r,0Ci/2^||^] -a(^)Tr«.(C,)|] =0. 

Af-s>oo 

We have E,[||7^(x, OC^/^^H^] = Ef=i(j"A,)'E,[(l AeQ"'(-'«))e|]. Therefore, 
to prove equation (4.23), it suffices to establish 

N 

(4.24) lim ^(/A,-)'E-"[|E.[(1 A e«"(-'«))e|] - a{i)\] = 0. 

Since X^j^iO^'^i)^ < ^ ^'^d |1 A e'^^^^''»^| < 1, the dominated convergence 
theorem shows that (4.24) follows from 



(4.25) lim E"'^[|E^.[(lAeQ'^(^'«))^?]-a(£)|] = Vj > 0. 
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We now prove equation (4.25). As in the proof of Lemma 4.7, we use the 
decomposition Q'^(3;,^) = Qj' (x,^) -'rQj'^lx,^) where Q^j_(x,^) is indepen- 
dent from S^j. Therefore, since Lip(/) = 1, 

E.[(lAe«"(^'«))C|] 

= E,[(l A e«"^(^'«))C|] + E,[[(l A e'^" - (1 A e«"^(^'«))]C|] 

= E.[1 A e'^'x +0({E^[|giV(^^^) _ Qf^^{x,0\']V^') 

= E,[lAe'3"x(-.0] + o({E,.[Qf(x,e)']}i/2). 
Lemma 4.5 ensures that, for /(•) = 1 A exp(-), 

lim E-^[|E,[/(Q^^^(x,0)] -a(^)|] =0, 

and the definition of Qf [x,S) readily shows that ImiN^oa'^'^^ [Qf {x,S)'^\ = 
0. This concludes the proof of equation (4.25) and thus ends the proof of 
Lemma 4.8. □ 

Corollary 4.9. More generally, for any fixed vector h € T-L^ , the fol- 
lowing limit holds: 

(4.26) lim ¥.'^''\{h,D^{x)h)s- {h,Csh)s\=Q. 

N^oo 

Proof. If /i = Lpi, this is precisely the content of Proposition 3.1. More 
generally, by linearity. Proposition 3.1 shows that this is true for h = X^j<Ar ctiipi, 
where G N is a fixed integer. For a general vector h G H^, we can use the 
decomposition h = h* + e* where h* = '^j<]\[{h,(pj)s^j and e* = h — h* . It 
follows that 

\{{h,D''{x)h)s - {h,Csh)s) - {{h*,D''{x)h*)s - {h*,Csh*)s)\ 

<\{h + h\D^{x){h - h*))s -{h + h\Cs{h - h*))s\ 

< 2\\h\\s ■ \\h - h*\l ■ (Tr«.(Z)^(x)) +Tr«,(C,)), 

where we have used the fact that for an nonnegative self-adjoint operator 
D -.y.^ — )• Ti^ we have {u, Dv)s < • • Tvy^s^D). Proposition 3.1 shows 
that E'^ [Tr^s(£)^(x))] < oo, and Assumption 2.1 ensures that Tr-^s(Cs) < 
oo. Consequently, 

lim E''''\{h,D^{x)h) - {h,Csh)\ 

N-^oo 

< lim E'^''\{h*,D^{x)h*) - {h*,Csh*)\ + \\h-h*\\, 

N^oo 

= \\h-h*\\s. 

Since \\h — h*\\s can be chosen arbitrarily small, the conclusion follows. □ 
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4.5. Martingale invariance principle. This section proves that the pro- 
cess W'^ defined in equation (3.7) converges to a Brownian motion. 

Proposition 4.10. Let Assumption 2.1 hold. Let z° ~ vr and W^{t), 

the process defined in equation (3.7), and x^'^ ~ tt^, the starting position 
of the Markov chain . Then 

(4.27) (xO'^,VF^)^(/,p^), 

where =^ denotes weak convergence in H.^ x C([0,T];H*), and W is a Un- 
valued Brownian motion with covariance operator Cg. Furthermore the lim- 
iting Brownian motion W is independent of the initial condition . 

Proof. As a first step, we show that converges weakly to W . As 
described in [19], a consequence of Proposition 5.1 of [3] shows that in order 
to prove that converges weakly to W in C([0,r];?^*), it suffices to 
prove that for any t € [0,T] and any pair of indices i,j > the following 
three limits hold in probability, the third for any e > 0: 

(4.28) lim At y E[||r'^'^||2|J-^'^] =rTr^.(C,), 

k=\ 

kN{t) 

(4.29) lim At V E[(r'='^,(^i),(r'='^,(^,),|J-^'^] =t((^i,C,(^,-)„ 

N—>-oc ^ — ' 
k=l 

kN{T) 

(4.30) ^lirn^At IE[||r'='^||^{||r'=^iV||2>Ate}|7-'='^]=0, 

k=l 

where kN{t) = [-^\, {^j} is an orthonormal basis of and J^^^^ is the 
natural filtration of the Markov chain {x^'^}. The proof follows from the 
estimate on D^[x) = E[r'^'^ (8) r'^'^jx'^'^ = x] presented in Lemma 4.8. For 
the sake of simplicity, we win write Efc[-] instead of E[-| J"'^'^]. We now prove 
that the three conditions are satisfied. 



Condition (4.28). It is enough to prove that 
limE 



L^l/3J 

E IE.[||r'='-||?][-Tr«.(C. 



k=l 

where 

N 

Efc[||r^'^||2]=EfcE[(<^,-,I)^(x'='^)(^,),]=EfcTVK.(Z)^(x'^'^)). 
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Because the MetropoUs-Hastings algorithm preserves stationarity and 

for all 
Conse- 



x^'^ ~ TT^, it follows that x^'^ ~ vr^ for any k > 0. Therefore, for all 

/c > 0, we have Tvu^D^ {x'''^)) S Tr^s(D^(x)) where x S vr^. 
quently, the triangle inequality shows that 



E 



[ [Afi/3J 



Tr^s (Cs 



k=i 



<E^^'\Trn^iD^ix))-TrwiCs)\ 

where the last limit follows from Lemma 4.8. 
Condition (4.29). It is enough to prove that 

LAri/3j 



0, 



limE'^ 



1 



V Efc[(r'='^,^,),(r 



k=l 

■^k,N 
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{ipi,D^{x^'^)ipj)s. Because x^^^ - 



where "Kkii^^' ,^i)s{T""'Wj)s\ = {(pi,U''[x 
the conclusion again follows from Lemma 4.8. 

Condition (4.30). For all k>l, we have x'^'^ ~ vr^ so that 



~ TT 



E" 
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Equation (4.21) shows that for any power p > 0, we have sup;YE'^^[||r'^'^||s] 
oo. Therefore the sequence {||r'''^||^} is uniformly integrable, which shows 
that 



lim E-''||rO'^||?l 



{||rO>^||2>Afl/3£} 



0. 



The three hypothesis are satisfied, proving that W'^ converges weakly in 
C{[0,T];7i'^) to a Brownian motion W in Ti^ with covariance Cg- Therefore, 
Corollary 4.4 of [19] shows that the sequence {{x^'^ ,W'^)}n>i converges 
weakly to {z^,W) in Ti x C{[0,T],T-L^). This completes the proof of Propo- 
sition 4.10. □ 

5. Conclusion. We have studied the application of the MALA algorithm 
to sample from measures defined via density with respect to a Gaussian 
measure on Hilbert space. We prove that a suitably interpolated and scaled 
version of the Markov chain has a diffusion limit in infinite dimensions. 
There are two main conclusions which follow from this theory: first, this 
work shows that, in stationarity, the MALA algorithm applied to an A^- 
dimensional approximation of the target will take 0{N^/'^) steps to explore 



< 
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the invariant measure; second, the MALA algorithm will be optimized at 
an average acceptance probability of 0.574. We have thus significantly ex- 
tended the work [23] which reaches similar conclusions in the case of i.i.d. 
product targets. In contrast we have considered target measures with sig- 
nificant correlation, with structure motivated by a range of applications. 
As a consequence our limit theorems are in an infinite dimensional Hilbert 
space, and we have employed an approach to the derivation of the diffusion 
limit which differs significantly from that used in [23]. This approach was 
developed in [19] to study diffusion limits for the RWM algorithm. 

There are many possible developments of this work. We list several of 
these. 

• In [4] it is shown that the Hybrid Monte Carlo algorithm (HMC) requires, 
for target measures of the form (1.1), 0{N^^^) steps to explore the invari- 
ant measure. However, there is no diffusion limit in this case. Identifying 
an appropriate limit, and extending analysis to the case of target measures 
(2.11), provides a challenging avenue for exploration. 

• In the i.i.d. product case, it is known that if the Markov chain is started 
"far" from stationarity, a fluid limit (ODE) is observed [11]. It would be 
interesting to study such limits in the present context. 

• Combining the analysis of MCMC methods for hierarchical target mea- 
sures [2] with the analysis herein provides a challenging set of theoretical 
questions, as well as having direct applicability. 

• It should also be noted that, for measures absolutely continuous with re- 
spect to a Gaussian, there exist new nonstandard versions of RWM [8], 
MALA [7] and HMC [5] for which the acceptance probability does not de- 
generate to zero as dimension increases. These methods may be expen- 
sive to implement when the Karhunen-Loeve basis is not known explicitly, 
and comparing their overall efficiency with that of standard RWM, MALA 
and HMC is an interesting area for further study. 

• It is natural to ask whether analysis similar to that undertaken here could 
be developed for Metropolis-Hastings methods applied to other reference 
measures with a non-Gaussian product structure. Particularly, the Besov 
priors of [18] provide an interesting class of such reference measures, and 
the paper [13] provides a machinery for analyzing change of measure from 
the Besov prior, analogous to that used here in the Gaussian case. An- 
other interesting class of reference measures are those used in the study 
of uncertainty quantification for elliptic PDEs: these have the form of an 
infinite product of compactly supported uniform distributions; see [25]. 
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